Cost-sensitive decision tree ensembles for effective imbalanced classification
نویسندگان
چکیده
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملEnsembles of (α)-Trees for Imbalanced Classification Problems
This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of α-divergence. First, a novel splitting criterion based on α-divergence is shown to generalize several wellknown splitting criteria such as those used in C4.5 and CART. When the α-divergence splitting criterion is applied to imbalanced data, one can obtain decisi...
متن کاملUsing Model Trees and Their Ensembles for Imbalanced Data
Model trees are decision trees with linear regression functions at the leaves. Although originally proposed for regression, they have also been applied successfully in classification problems. This paper studies their performance for imbalanced problems. These trees give better results that standard decision trees (J48, based on C4.5) and decision trees specific for imbalanced data (CCPDT: Clas...
متن کاملClassification of SchoolNet Data
SchoolNet Data, mainly educational material, was authored by SchoolNet to make it easy for teachers and learners to find educational resources in various subjects. The task of automatically assigning subject categories to learning materials has become one of the key steps for organizing online information. Since hand-coding classification rules is costly or even impractical, most modern approac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Appl. Soft Comput.
دوره 14 شماره
صفحات -
تاریخ انتشار 2014